feat(monitoring): expose share rejection metrics on Prometheus surface#475
Open
gimballock wants to merge 2 commits intostratum-mining:mainfrom
Open
feat(monitoring): expose share rejection metrics on Prometheus surface#475gimballock wants to merge 2 commits intostratum-mining:mainfrom
gimballock wants to merge 2 commits intostratum-mining:mainfrom
Conversation
added 2 commits
May 2, 2026 20:54
…hotCache::refresh (stratum-mining#337) Move all Prometheus gauge updates (set + stale-label removal) out of the /metrics HTTP handler and into SnapshotCache::refresh(), which runs as a periodic background task. This eliminates the GaugeVec reset gap where label series momentarily disappeared on every scrape. Changes: - SnapshotCache now owns PrometheusMetrics and PreviousLabelSets - refresh() updates snapshot data AND Prometheus gauges atomically - /metrics handler reduced to: set uptime gauge, gather, encode - ServerState simplified (no more PreviousLabelSets or Mutex) - Tests updated to wire metrics through cache via with_metrics() - Integration tests: replace fixed-sleep assertions with poll_until_metric_gte (100ms poll, 5s deadline) for CI resilience - Clone impl preserves previous_labels for correct stale-label detection - debug-level tracing on stale label removal errors - debug_assert on with_metrics double-attachment Closes stratum-mining#337
Add Prometheus gauges for share submission and rejection data that was
previously only available through the JSON monitoring API:
Server metrics:
- sv2_server_shares_submitted_total{channel_id, user_identity}
- sv2_server_shares_rejected_total{channel_id, user_identity, error_code}
Client metrics:
- sv2_client_shares_rejected_total{client_id, channel_id, user_identity}
These enable time-series alerting on rejection rates and per-reason
breakdown (stale, duplicate-share, etc.) via rate() queries and
recording rules.
Implementation:
- Register new GaugeVecs in PrometheusMetrics
- Populate from existing shares_rejected HashMap (server) and
rejected_shares u32 (client) in SnapshotCache::update_metrics
- Track server rejection label triples in PreviousLabelSets for
stale series cleanup
- Add shares_rejected field to client ExtendedChannelInfo and
StandardChannelInfo (defaults to 0 until stratum#2119 adds
rejection tracking to server-side ShareAccounting)
d847f42 to
aa30316
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
The JSON API already reports per-channel
shares_rejected(HashMap<String, u32>keyed byerror_code) andshares_submitted, but none of this data reaches the Prometheus/metricsendpoint. Operators using Prometheus for alerting and dashboards have no way to track share rejection rates or reasons over time.Motivation
rejected / submitted) is the most direct signal that something is wrong (miner misconfiguration, difficulty drift, network latency). Without Prometheus exposure, rate-based alerting is not possible.HashMap<String, u32>already distinguishes reasons (stale, duplicate-share, etc.). Surfacing this per-reason enables operators to distinguish transient stale-share spikes (normal after a new block) from sustained protocol errors.rate()queries, and recording rules require Prometheus exposure.Current State
Server-side Prometheus (what exists):
sv2_server_shares_accepted_total{channel_id, user_identity}shares_submittedorshares_rejectedgaugesServer-side JSON API (what exists but is not in Prometheus):
shares_submitted: u32shares_rejected: HashMap<String, u32>Client-side:
sv2_client_shares_accepted_totalexists in Prometheusstratum-coreclientShareAccountingalready tracksrejected_shares: u32Changes
Server metrics
sv2_server_shares_submitted_total{channel_id, user_identity}— gauge fromshares_submitted, enables rejection-rate denominatorsv2_server_shares_rejected_total{channel_id, user_identity, error_code}— gauge from iteratingshares_rejectedmap entriesClient metrics
sv2_client_shares_rejected_total{client_id, channel_id, user_identity}— gauge from upstreamShareAccounting::get_rejected_shares()(scalaru32, no per-reason breakdown available at this layer)shares_rejected: u32field toExtendedChannelInfo/StandardChannelInfoin client monitoring typesStale label cleanup
(channel_id, user_identity, error_code)triples inPreviousLabelSetsand removes stale combinations on refresh, same pattern used for existing share/hashrate labelsCardinality
The
error_codelabel is nominally unbounded (the SV2 spec allows arbitrary strings), but in practice the serverShareValidationErrorenum defines ~9 well-known variants. Upstream stratum-mining/stratum#2142 is working toward typederror_codeconstants which will further bound this.Client-side limitation
The server-side
ShareAccountinginstratum-coredoes not currently track rejected shares (stratum-mining/stratum#2119). Client monitoringshares_rejecteddefaults to0until that is addressed upstream. The Prometheus gauge and API field are in place for when the data becomes available.Example PromQL
Testing
All existing tests pass (80 in stratum-apps, 6 in pool, miner-apps all green). Existing test helpers updated for new fields.
Related
shares_rejectedto server monitoring types (merged)shares_rejectedAPI compatibility (merged)channels_sv2::server::share_accounting::ShareAccountingneeds refinement stratum#2119 — server-sideShareAccountingrefinement (open)error_codestrings stratum#2142 — typederror_codeconstants (open)ShareAccountingwork representation fromf64tou64onchannels_sv2stratum#2092 —f64→u64work representation (open)